摘要 :
The geolocation of data has become a key concern with the evolution of cloud computing. Although data migration is quite common and sometimes essential for the purpose of load balancing or service guarantees, at times, it creates ...
展开
The geolocation of data has become a key concern with the evolution of cloud computing. Although data migration is quite common and sometimes essential for the purpose of load balancing or service guarantees, at times, it creates a risk for the user and could even violate the service agreement. A malicious service provider could also relocate the data, which could jeopardize data privacy and security. In this paper, we introduce a novel algorithm called IGOD to geolocate a cloud data center which can also be used for geolocating internet nodes. IGOD efficiently geolocates any target data center with higher accuracy and less cost. It provides audit control and assurance against such cloud storage providers who may move around a customer's data. We analyze and compare IGOD with currently available solutions of geolocating a target. We have used PlanetLab to validate IGOD and establish its cost-effective feature. To do so, we first use our own data collection to geolocate the test data center using emulation and compare IGOD's performance with other schemes. Finally, we use it to geolocate one of the Amazon S3 data centers. Our comparison shows that IGOD provides relatively higher location accuracy and is cost-effective (uses less resources).
收起
摘要 :
One of the major practices of cloud computing is the storage service that it offers. In spite of its many creditable advantages, it also has some disadvantages like data security and data availability. These two are the main issue...
展开
One of the major practices of cloud computing is the storage service that it offers. In spite of its many creditable advantages, it also has some disadvantages like data security and data availability. These two are the main issues that a user face. Many models have been proposed to solve these issues. These models use cryptographic methods to secure the data and data redundancy method to ensure data availability. Both these methods solved the issues at the cost of extra storage space and increased time consumption both at the user and at the server side. This paper recommends a model PCSP (Protected Cloud Service Provider) which solves these issues in a novel way. The model uses light weight techniques which does not employ cryptographic methods. PCSP uses a layered approach, with three entities - the user, the PCSP, and the vendor. Due to the use of light weight techniques, the execution time is reduced by 80% and the storage needed is also reduced by 60%. Thus, there is still more reduction in the storage space. The implementation and analysis serve as the proof of concept
收起
摘要 :
Big data is aggressive in its production, and with the merger of Cloud computing and loT, the huge volumes of data generated are increasingly challenging the storage capacity of data centres. This has led to a growing data-capacit...
展开
Big data is aggressive in its production, and with the merger of Cloud computing and loT, the huge volumes of data generated are increasingly challenging the storage capacity of data centres. This has led to a growing data-capacity gap in big data storage. Unfortunately, the limitations faced by current storage technologies have severely handicapped their potential to meet the storage demand of big data. Consequently, storage technologies with higher storage density, throughput and lifetime have been researched to overcome this gap. In this paper, we first introduce the working principles of three such emerging storage technologies, and justify their inclusion in the study based on the tremendous advances received by them in the recent past. These storage technologies include Optical data storage, DNA data storage & Holographic data storage. We then evaluate the recent advances received in storage density, throughput and lifetime of these emerging storage technologies, and compare them with the trends and advances in prevailing storage technologies. We finally discuss the implications of their adoption, evaluate their prospects, and highlight the challenges faced by them to bridge the data-capacity gap in big data storage. (C) 2018 Elsevier B.V. All rights reserved.
收起
摘要 :
Companies are collecting and storing huge amounts of data, much of it redundant. Many organizations are turning to data deduplication to reduce these huge information volumes, as well as the equipment and operational costs they entail.
摘要 :
In a cloud object system, all requests and data are routed through the proxy server, which limits system performance. Exploiting data correlations to prefetch objects can effectively alleviate stress on the proxy server and improv...
展开
In a cloud object system, all requests and data are routed through the proxy server, which limits system performance. Exploiting data correlations to prefetch objects can effectively alleviate stress on the proxy server and improve system performance. To achieve this goal, it is necessary to consider the extraction of data correlations, the distribution of correlated objects in the cluster, and the way correlated objects are stored. In this paper, we propose a policy-driven framework, Cora, to support data correlations-based storage policies to efficiently maintain correlations and prefetch correlated objects in the cloud object storage system. Based on the characteristics observed in explicit and implicit data correlations, we design different storage policies with various scheme implementations. Experiments demonstrate that leveraging data correlations can bring significant performance improvements to the cloud object storage system. Throughput and latency are optimized up to 285.24% and 55.39%, respectively.
收起
摘要 :
Companies are generating data at a faster rate than ever before because of many factors, including the exponential growth of applications and Web generated contents. This has led to significant increases in storage requirements an...
展开
Companies are generating data at a faster rate than ever before because of many factors, including the exponential growth of applications and Web generated contents. This has led to significant increases in storage requirements and a larger percentage of IT budgets being allocated for storage purchases. Storage specialists can play a decisive role in ensuring the company never faces the specter of losing critical data.
收起
摘要 :
We present the design, implementation, and evaluation of INSTalytics, a co-designed stack of a cluster file system and the compute layer, for efficient big-data analytics in large-scale data centers. INSTalytics amplifies the well...
展开
We present the design, implementation, and evaluation of INSTalytics, a co-designed stack of a cluster file system and the compute layer, for efficient big-data analytics in large-scale data centers. INSTalytics amplifies the well-known benefits of data partitioning in analytics systems; instead of traditional partitioning on one dimension, INSTalytics enables data to be simultaneously partitioned on four different dimensions at the same storage cost, enabling a larger fraction of queries to benefit from partition filtering and joins without network shuffle.
收起
摘要 :
Container-based Hadoop distributed file system (HDFS) storage has been widely used in cloud data center networks, while traditional HDFS has single point problem resulting in overall unavailability. In this paper, we mainly study ...
展开
Container-based Hadoop distributed file system (HDFS) storage has been widely used in cloud data center networks, while traditional HDFS has single point problem resulting in overall unavailability. In this paper, we mainly study the storage reliability of the Docker container-based HDFS cluster with single point of failure. Firstly, we investigate a data volume-based persistence solution of Hadoop with the single point failure and single backup strategy of HDFS cluster. Secondly, we propose an HDFS-based replica placement algorithm for data storage with considering the performance of the host and container nodes. Thirdly, we design the KADC-KNN data segmentation algorithm to effectively store the persistent data of the Docker container. Extensive experimental results show that this method can effectively ensure the stable storage and fast migration of cluster data. Compared with the most advanced algorithm, the proposed data volume persistence algorithm DVPS can improve the data reliability by 19.8%. The data partitioning algorithm KADC-KNN improves the partitioning accuracy by 20.2% and has lower time overhead.
收起
摘要 :
Background:With the inherent high density and durable preservation, DNA has been recently recognized as a distinguished medium to store enormous data over millennia. To overcome the limitations existing in a recently reported high...
展开
Background:With the inherent high density and durable preservation, DNA has been recently recognized as a distinguished medium to store enormous data over millennia. To overcome the limitations existing in a recently reported high-capacity DNA data storage while achieving a competitive information capacity, we are inspired to explore a new coding system that facilitates the practical implementation of DNA data storage with high capacity.Result:In this work, we devised and implemented a DNA data storage scheme with variable-length oligonucleotides (oligos), where a hybrid DNA mapping scheme that converts digital data to DNA records is introduced. The encoded DNA oligos stores 1.98 bits per nucleotide (bits/nt) on average (approaching the upper bound of 2 bits/nt), while conforming to the biochemical constraints. Beyond that, an oligo-level repeat-accumulate coding scheme is employed for addressing data loss and corruption in the biochemical processes. With a wet-lab experiment, an error-free retrieval of 379.1 KB data with a minimum coverage of 10x is achieved, validating the error resilience of the proposed coding scheme. Along with that, the theoretical analysis shows that the proposed scheme exhibits a net information density (user bits per nucleotide) of 1.67 bits/nt while achieving 91% of the information capacity.Conclusion:To advance towards practical implementations of DNA storage, we proposed and tested a DNA data storage system enabling high potential mapping (bits to nucleotide conversion) scheme and low redundancy but highly efficient error correction code design. The advancement reported would move us closer to achieving a practical high-capacity DNA data storage system.? The Author(s) 2019.
收起